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1 .   INTRODUCTION 

In  recent  years,  large  and  complex  multiprocessor  systems  have 
been  designed  and  implemented.   The  architectural  differences  among  these 
systems  can  be  characterized  in  terms  of  the  way  to  interconnect 
functional  units  [1].   Some  of  them  are  common  bus  systems,  or  crossbar 
switch  systems  (e.g.,  Cmmp  [2]),  or  multiport-memory  systems  (e.g., 
Prime  [3]).   We  focus  our  attention  here  on  the  types  of  functional  units 
(processors)  in  the  system.   Some  multiprocessor  systems  consist  of 
identical  general  purpose  processors,  which  share  the  input  job  load 
under  the  control  of  the  job  scheduler.   Other  multiprocessor  systems 
consist  of  special  purpose  processors  (or  functionally  dedicated  pro- 
cessors) each  of  which  is  designed  to  execute  a  particular  type  of  job 
or  task.   (For  example,  these  processors  may  be  file  management  pro- 
cessors, input-output  control  processors,  and  processors  for  high  speed 
computations,  etc.)   These  special  purpose  processors  may  be  implemented 
in  hardware,  firmware  or  software. 

In  this  paper,  both  types  of  multiprocessor  systems  are  modelled 
queueing  theoretically  and  their  performance  is  evaluated.   Our  purpose 
here  is  not  to  extend  known  results  in  the  queueing  theory  itself,  but 
to  determine  the  architectural  merits  of  the  two  types  of  systems. 
Moreover,  in  the  case  of  the  multiprocessor  system  with  special  purpose 
processors,  the  optimal  architecture  are  discussed. 

The  notations  which  are  necessary  in  our  discussions  are  defined 
in  section  2.   In  section  3,  we  discuss  the  models  where  the  arrivals  of 


different  types  of  tasks  are  statistically  independent.   In  section  4, 
queueing  networks  are  used  to  model  systems  in  which  the  execution  of 
some  tasks  cannot  be  started  before  the  completion  of  some  other  tasks 


2.   NOTATIONS 

Let  us  assume  that  jobs  arriving  for  service  of  the  multi- 
processors system  may  be  decomposed  into  a  number  of  different  tasks. 
There  are  altogether  N+l  different  types  of  tasks.   (For  example,  con- 
sider a  system  in  which  some  jobs  may  be  decomposed  into  input  tasks 
followed  by  compilation,  computation  and  output  tasks  while  the  other 
jobs  may  be  decomposed  into  input,  sorting,  merging  and  output  tasks. 
In  this  case,  we  say  that  there  are  6  different  types  of  tasks.)   We 
refer  to  the  set  of  type  i  tasks  as  task  set  J . • 

It  suffices  here  to  consider  the  relative  speeds  of  the  pro- 
cessors.  In  particular,  to  compare  the  merits  of  a  system  consisting  of 
different  special  purpose  processors  with  the  system  consisting  of 
identical  general  purpose  processors,  we  measure  their  relative  speeds. 
The  relative  speed  of  a  special  purpose  processor  with  respect  to  a  general 
purpose  processor  is  called  the  capacity  of  the  special  purpose  processor. 
For  example,  if  a  task  takes  1/u  units  of  time  to  be  completed  by  a  general 
purpose  processor,  then  it  takes  (l/u)(l/C)  units  of  time  to  be  completed 
by  a  special  purpose  processor  with  capacity  C . 

We  refer  the  time  required  to  complete  a  task  in  a  given  system  as 
the  execution  time  (or  service  time)  of  the  task  on  that  system. 
Particularly,  the  execution  time  of  the  task  on  a  general  purpose 
processor  is  called  the  amount  of  work  for  that  task.   Hence,  if  a 
special  purpose  processor  with  capacity  C  completes  the  given  task  within 
t  sec,  then  the  amount  of  work  for  that  task  is  fC  units. 


We  measure  the  effectiveness  of  a  multiprocessor  system  by  the 
total  amount  of  work  remaining  in  the  system.   That  is,  the  total  time 
required  for  a  general  purpose  processor  to  complete  all  tasks  being 
served  and  waiting  for  service  in  the  system.   This  performance  measure 
is  chosen  since  it  is  dependent  only  on  the  architecture  of  the  system 
but  not  on  the  queueing  discipline  used  to  schedule  tasks  on  processors. 

Let  n  be  the  mean  queue  length  in  a  system  consisting  of  a  pro- 

q 

cessor  with  capacity  C,  and  E(S  )  be  the  mean  residual  time  (that  is  the 
remaining  execution  time  of  the  task  in  progress).   The  average  total 
amount  of  work  remaining  in  the  system  is  given  by 

-  n  +  CE(S  )  (2.1) 

u   q        r 

where  1/y  is  the  average  amount  of  work  for  the  task.  (Note  that  n  and 
E(S  )  also  depend  on  C.) 


3.   THE  SYSTEMS  WITH  INDEPENDENT 
INPUT  PROCESS 


In  this  section,  we  consider  the  queueing  models  of  multiprocessor 
systems  in  which  the  arrival  processes  of  the  different  types  of  tasks  are 
statistically  independent.   It  is  difficult  to  analyze  the  general 
behavior  of  multiprocessor  queues  except  for  the  simple  case  where  the 
arrival  process  is  Poisson  and  service  time  is  exponentially  distributed. 
The  assumption  of  exponential  distribution  for  service  time  being  invalid 
in  our  case,  we  are  forced  to  use  various  approximation  methods.   Again, 
our  purpose  in  this  section  is  to  evaluate  the  relative  effectiveness  of 
different  multiprocessor  system  architectures. 

3.1  The  Deterministic  Task  Execution  Time 

In  this  model,  we  assume  that  the  amountof  work  required  for  all 

tasks  is  constant  and  identical.   Furthermore,  the  interarrival  time  of 

jobs  is  exponentially  distributed.   Each  of  these  jobs  is  decomposed  into 

a  number  of  tasks  as  shown  in  Figure  3.1.   We  say  that  these  tasks  are 

generated  by  the  job.   Let  T.  denote  the  number  of  tasks  of  type  i 

generated  by  a  job.   Then  the  total  number  of  tasks  generated  by  the  job 

is  T  =  Tn  +  T,  +  . . .  +  T  .   We  assume,  furthermore,  that  T.'s  are 
0     1  N  l 

statistically  independent  random  variables.   Let  us  denote  the  generating 

function  of  random  variable  T.  by  A.(Z).   Then,  the  generating  function 

N 
of  T  is   n  A. (Z)  . 
i=0  X 


task  type 


number  of  tasks 


Figure  3.1  Decomposition  of  a  Job 


3.1.1   The  Performance  of  a  Multiprocessor  System 
with  General  Purpose  Processors 

This  system  consists  of  m  general  purpose  processors  as 
described  in  Figure  3.2.   Since  all  processors  are  identical,  a  task 
can  be  executed  by  any  of  the  m  processors.   Therefore,  the  job 
scheduler  assigns  a  waiting  task  to  any  processor  when  it 
becomes  idle.   When  all  processors  are  busy,  the  task  joins  the  common 
queue. 

Let  us  define  the  amount  of  work  for  a  task  as  a  unit  of  time  or 
a  time  slot  (see  Figure  3.3).   If  the  interarrival  time  of  jobs  is 
sufficiently  long  compared  with  this  time  slot,  then  we  can  approximate 
the  exponential  distribution  of  the  interarrival  time  by  a  geometric 
distribution.   Specifically,  the  probability  that  there  are  n  jobs  arrived 
within  the  time  duration  At  starting  from  time  t  is: 

r    •  ,  i   .    r         it     (AAt)     -A(At) 

p   =  Pr  {n  jobs  arrived  in  [t,t+M:]}  = p^—  e 

If  we  assume  that  X  is  sufficiently  small  (X«l)  ,  then  we  can  approximate 
this  probability  distribution  as: 

r?Q   =  1  -  AAt 


I  P1   -  XAt 

l p  =0    for  n  =  2,3,... 
^  n 


Let  At  =  1.   The  expression  above  becomes 


P0  =  1  -  A 
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Figure  3.3  Time  Slot 
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Let  the  generating  function  of  the  total  number  of  tasks  arrived  within 
a  time  slot  be  denoted  by  A(Z) .   A(Z)  is  given  by: 


N 
A(Z)  =  (1-X)  +  A   n  A.(Z) 
i=0  x 


(3.1) 


We  assume  that  the  job  scheduler  assigns  tasks  to  the  processors 

at  the  beginning  of  each  time  slot.   If  t  is  the  nth  epoch  of  this  time 

slot,  the  total  number  of  tasks  at  t  +0   forms  a  markov  chain.   Since 

n 

the  execution  time  of  a  task  is  1  time  slot,  the  total  number  of  tasks  at 

t  +  0   is  equivalent  to  the  amount  of  work  in  the  system  at  t  +  0  .   Let 
n  n 

the  probability  of  the  arrival  of  j  tasks  during  one  time  slot  be  P. 

00  -* 

(i.e.,  A(Z)  =  E  P.ZJ).   Assuming  that  the  number  of  processor  m  is  larger 
j=0  J 

than  AE(T),  we  solve  the  mean  number  of  the  tasks  in  the  system  at  the 
equilibrium. 

Let  II..  be  the  transition  probability  of  the  total  number  of  tasks 
In  the  system,  then 


f 


n. .  =  <  p. 


V  j-i+m 


for  j  <  i  -  m 

for  i  <  m 

for  i>^m,  j>_i-m 


(3.2) 


Since  at  the  equilibrium,  the  probability  of  having  j  tasks  in  the  system  is 


n.  =  i    n.  .n.  , 
2      i=o  1J  x 


the  generating  function  P(Z)  is 


p(z)  =  z  n.zJ  =  z  z2    z  n..n.  =  z    n.  z    n..zJ 
j-0  2         j=o   i=o  1J  x   i=0  x  j=0  1J 
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From  the  condition  in  (3.2),  we  get 


00  oo 


p(z)  =  nn    e  l.zj  +  l     z  l.zj  +  ...  +n    ,     En    ,  .zJ  +    i  n.     e  i,.z- 

0    .    _    On  1    .    _   li  m-1        _  m-lj  .         i    .    .    ij 

j=0      J  j=0      J  j=0  J  i=m        3=0 

Thus, 

m— 1  00  00  oo 

p(z)  =    e    n.     e    p.zj  +    e    n.      e      p.   _     •  zj_1+tn  •  z1_m    . 

1=0  j=0     J  i=m  j=i-m 

00 

Since  A(Z)    =      E      P.Z1, 
i=0      X 

m-1  °° 

p(z)  =    e    n.A(z)  +    e    n.A(z)z1_in 

1=0  i=m 

m-1  a^v^  m--'- 

=    e    n.A(z)  +  sV£2.  (p(Z)  _    j     n.z1)     . 
i-o    *  zm  i-o    x 


In  other  words, 


,  v    m-1     .      m-1 

p(z)  =  Au;   (  e  n.z1  -  zm  e  n.) 
A(z)-zm  i-o  x      i-o  x 


m-1 
Let  F(Z)  =   E   n.(Z1-Zm),  then 
i-0  1 


P(z)  .  ACZHCZ!   .  (3.3) 

A(Z)-Zm 


Since  lim  P(Z)  =  1  and  lim  A(Z)F(Z)  =  1, 
Z-+1  Z+l  A(Z)-Zm 


lim  A'(Z)F(Z)+A(Z)F'(Z)    A'(1)F(1)+A(1)F'(1)  = 
Z'y1'         A'CZ)^1"'1  A'(1)"m 


In  other  words , 

A'(1)F(1)  +  F'(l)  =  A'(l)  -  m  , 
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But  F(l)  =  0.   Hence 

F'(l)  =  A'(l)  -  m  .  (3.4) 

However, 

m-1 
F'(Z)  =   E   (iZ1  -mZ1""1)  n.  , 
1=0  1 

and 

m-1 
-  F'(l)  =   £   (m-i)  n.  . 
1=0       1 

The  sum  in  the  right  hand  side  of  the  equation  above  is  the  mean  number  of  idle 
processors.   Then,  m+F'(l)  is  the  mean  number  of  busy  processors.   Therefore, 
according  to  Eq .  (3.4) 

A'(l)  =  E{number  of  busy  processors}  . 

The  average  number  of  tasks  in  the  system  at  each  epoch  is  derived 
in  appendix.   It  is  given  by: 

p.m   ~=  z  _i_x m(m-,l) +A'(1)+  A"(1) (3  5) 

i=0(1"Zi    2(m-A'(l))  +A  U;  +  2(m-A»(l))   '       K*'D) 

zm 

where  Z-,Z..,...,Z   0  are  the  zeros  of  (1  -  ;  /_>  )  within  the  unit  circle.   If 
U   1      m—  2.  A(,ZJ 

we  assume  that  every  job  is  decomposed  into  a  number  of  tasks  that  are 

integer  numbers  of  m,  then  we  can  solve  explicitly  for  n.   In  other  words, 

m 
we  assume  that   IT  A.(Z)  is  a  polynomial  of  Z  .   (This  model  can  be 

i=l   1 
reduced  to  the  single  processor  case.   But  here  we  will  derive  the  explicit 

expression  using  eq .  (3.5).)   In  this  case,  II.  =  0  for  i  =  1,2,..., m-1, 

F"(l) 
and  by  appendix  the  first  term  in  eq .  (3.5)  can  be  expressed  as  ?F,  ,.  >,  . 


Since 


F„(1)  m   d_2Xzl  m   m"  n  {i(i_1)zi-2  _  m(m_1)zm-2} 

dZ      .  .   l 
1=1 


Z=l  ' 


m-1  a> 

F"(l)  +  m(m-l)  =  Z      i(i-l)  II.  +  Z      m(m-l)  n. 
1=1  i=m 


Let  p  =  Z      II. .   Then 
1 
i=m 


m-1 


Therefore 


F"(l)  +  m(m-l)  =  Z      i(i-l)  II .  +  m(m-l)p  . 
i=l         1 


F"(l)  =  m(m-l) (p-1) 


Meanwhille 


m-1        °° 
A* (1)  =  Z      in.  +  Z      mn.  =  mp 
i=l       i=m 


Therefore 


F"(l)    m(m-l)(p-l)  _  m-1 
2F'(1)     2m(p-l)      2 


Thus  we  get 
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—   m-1     m(m-l)      .l/n      A"(l)  ,      -. 

n  =  T~-  2(m-A'(l))  +A  (1)  +2(m-A'(l))   '  (3'6) 

According  to  eq .  (3.4),  the  mean  number  of  busy  processors  is 

given  by  A'(l).   Hence,  n  -  A'(l)  tasks  are  in  the  queue.   For  an 

arbitrary  time  t,  the  mean  residual  execution  time  is  given  as  1/2.   The 
average  amount  of  work  in  the  system  is  given  by: 

Wg  =  n  -  A'(l)  +\   A'(l) 


or 


-    m-1  _   m(m-l)     A^J      A"(l)  } 

8    2    2(m-A'(l))  +   2     2(m-A»(l))  k "*' /; 
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m. 


where  m  >  A'(l).   Figure  3.4  describes  the  behavior  of  Wg  when 
A(Z)  =  1  -  A  +  AZ  ,  and  m  =  N  +  1  =  3. 


3.1.2  The  Performance  of  a  Multiprocessor  System 
with  Special  Purpose  Processors 

This  system  consists  of  N  +  1  subsystem  of  special  purpose 
processors  as  described  in  Figure  3.5,  where  m  +  m  +  ...  +  m 
Since  a  special  purpose  processor  can  execute  only  one  type  of  task,  the 
job  scheduler  assigns  a  type  i  task  to  ith  subsystem  or  let  it  join  the 
ith  queue. 

Let  us  consider  the  number  of  tasks  in  the  ith  subsystem.   If  we 
let  C.  be  the  capacity  of  each  processor  in  this  subsystem  (C.  >_  1),  then 
the  execution  time  of  a  task  is  1/C..   Therefore,  if  we  define  this 
execution  time  as  a  time  slot,  then  the  number  of  tasks  arrived  to  this 
processor  during  one  time  slot  is  (A/C.)  T.. 

Let  B.(Z)  be  the  generating  function  of  the  number  of  tasks  arrived 

during  the  execution  of  a  task,  and  P.(Z)  be  a  generating  function  of  the 

number  of  tasks  in  ith  subsystem.   Then  the  same  discussion  as  in  the 

previous  section  can  be  applied.   According  to  eq .  (3.3), 

B  (Z)F.(Z) 
P,(Z)  = 


where 


l  m. 

B.(Z)-Z  X 


Vz)  =  1-cT  +  cr Ai(z)  • 

l    l 


From  eq.  (3.5),  the  average  number  of  tasks  in  ith  subsystem  n.  is  given 
by: 


W^j 


15 


Wg(A=0.25) 


Wg(A=0.25) 


Figure   3.4     Curves  W     and  Wg 
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_        mi   2  .  m  (n^-1)  BV(1) 

Pi(1)    =  ni   =      *  (i)    "    2(m  -B!(l))      +      2(m,-B:(l))       * 


m, 


Thus,    if  we  assume  A. (Z)    is   a  polynomial   of  Z      ,    the  average  amount  of 
work  in   the  ith  subsystem  W.    is   given  by  equation   (3.7). 

m.-l  m.(m.-l)  B|(l)  B'.'U) 

77     _  _Jt i      i  ,   _JL x 

Wi  "        2  2(m.-B|(l))  2  2(m.-B!(l))      * 

li  li 


where 


B.(Z)    =   1   -   A/C.    +   (X/C.)    A.(Z) 


Moreover,  the  total  mean  amount  of  work  in  this  system  Ws  is  given  by: 

_     N   m.-l      m.(m.-l)    B!(l)      B'.'(l) 

WS  =  .lQ   {^T~  ~   2(m.-B:1(l))  +  "V"  +  2(m.-Bj(l))}  (3'8) 


where 


N 
m  =  E  m. 
i=0  x 


In  particular,  ifC.  =C,  m.  =1,  N  +  1  =  m,  then  eq.  (3.8)  becomes 

,    m   A!(l)      CAV(l) 

WS  =  2   f  {~ C~~   +  (C-X-A^(l))} 

Figure  3.4  also  shows  the  behavior  of  the  curves  Ws  in  the  case 
N  +  1  =  m  =  3,  A.(Z)  =  Z  ,  m.  =  1  for  i  =  1,2,3.   That  is, 

Wg  =  3A  -  1  + 


1-2A 


77    *  /6  j.  2c     6C  , 
WS  =  2  {C  +  ^2A  +  ^ 
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where  A  £  —  and  X   £  C/3. 

3.2  The  Heavy  Traffic  Approximation  with 
Exponential  Execution  Time 

In  this  model,  we  assume  the  amount  of  work  required  for  type  i 

tasks  is  exponentially  distributed  with  parameter  a..   Furthermore,  we 

assume  the  arrived  job  is  decomposed  into  N  +  1  tasks  of  different  types. 

In  other  words,  if  we  let  S.  be  the  amount  of  work  required  for  type  i 

job,  then  the  total  amount  of  work  generated  by  the  job  is  S  =  S.  +  S1 

+  ...  +  S  where  S.'s  are  assumed  to  be  statistically  independent  random 

variables.   Moreover,  we  assume  the  interarrival  time  of  two  jobs  is 

exponentially  distributed  with  parameter  A,  and  the  system  is  heavily 

loaded. 


3.2.1  The  Performance  of  a  Multiprocessor  System  with 
General  Purpose  Processors 

This  system  is  similar  to  the  one  described  in  Figure  3.2.   Let 

us  denote  by  t   the  epoch  when  nth  job  arrived.   Let  W  be  the  amount  of 

3      n      r  J  n 

work  remaining  at  t  ,  A  be  the  amount  of  work  completed  and  B  be  the 

n   n  n 

amount  of  work  arrived  in  the  duration  [t   ,,  t  ].   Then  W  ,  can  be 

'    n-1        n  n+1 

expressed  as 

W+B-A  if  W     +  B      -  A      >   0 

n  n  n  n  n  n  — 

(3.9) 

n+1 

L  0  if  W     +  B     -  A     <   0 

^  n  n  n  — 

Here,  we  assume  heavy  traffic  condition.   That  is:   the  number  of 

tasks  arrived  during  [t   n , t  ]  is  sufficiently  large  so  that  all  m  pro- 

n— i  n 

cessors  are  busy  all  the  time.   Therefore,  A  is  independent  of  n. 
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Moreover,  A  has  the  same  distribution  as  mA  where  A  is  the  interarrival 
n 

time  of  two  jobs.   Since  A  is  exponentially  distributed  with  parameter  A, 
mA  is  also  exponentially  distributed  with  parameter  A/m.   Also,  B  has 
the  same  distribution  as  S,  we  can  write  eq.  (3.9)  as: 

W     +   (S-mA)  ifW     +S-mA>0 

n  n 

ViM  (3-10> 

,0  ifW+S-mA<0 

^  n 

However,  the  expression  in  the  right  hand  side  of  this  equation  is  the 

waiting  time  of  the  (n+l)th  job  in  the  case  when  the  system  consists  of 

one  processor,  the  execution  time  of  a  job  is  S,  and  the  interarrival  time 

of  jobs  is  mA.   Thus,  we  see  that  the  mean  waiting  time  in  the  single 

server  system  shown  in  Figure  3.6  is  equivalent  to  the  average  amount  of 

work  remaining  in  our  original  model. 

Since  Pr{A  <  t}  =  1  -  e~  t,    and  S  =  Sn  +   Sn  +  ...  +  S  where 
—  0    1         n 

-a.t 

P{S.  <  x}  =  1  -  e  X  , 

l  — 

E[mA]  =  j 

N   1 
E[S]  =  I     — 

1=0  i 

j  N     2    N      2 

E[SZ]  =  (  I  — )  +  I  (— ) 
.  n  a/  .  n  a.' 
i=0  i     i=0   i 

In  the  case  of  single  server,  the  mean  queue  length  and  the  mean  residual 
time  are  given  by  [A]: 
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(X/m)2E(S2) 
ng     2(1- p) 

_,_  v  _  (A/m)E(S2) 
E(Sr)  -     - 

N    1 
where  p  =  (A/m)E(S)  =  (A/m)   Z   ( — ).   Therefore,  as  mentioned  in 

i=0  ai 

Section  2,  the  mean  waizing  time  in  queue  of  this  single  server 
system  is 

1(8)  ng  +  E(Sr)  =  (X/^E(S2)  (^  f  1) 

Moreover,  the  mean  amount  of  work  remaining  in  our  original  system  Wg  is: 

-     _    (A/m)E(S2)        1 
Wg  - <— ) 

(A/m)    ((  _1_       +     E      (_Ls    )(JL-) 

2        U/n  a/  ,    ft    V;    nl-p; 

i=0      i  i=0        l 


where 


N         1 
p   =    (A/m)      Z      (^-) 

i=0        i 


or 

1  N        1  N        1    2 

/u_p;        i=0    ai  i=0    ai 

3.2.2  The  Performance  of  a  Multiprocessor  System 
with  Special  Purpose  Processors 

This  system  is  similar  to  the  one  described  in  Figure  3.5.   Let 

us  denote  by  C.  the  capacity  of  each  processor  in  the  ith  subsystem.   Let 

m.  and  W    be  the  number  of  processors  and  the  amount  of  work  remaining 
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at  the  nth  epoch  in  the  ith  subsystem,  respectively.   By  applying  the 
same  arguments  in  3.2.1,  we  express  W  i    as  follows: 

rW(i)   +  S     -  C^m.A  if  W(l)   +  S,    -  C^m.A  >   0 

n  iii  n  iii 


<&• 


if  W(l)   +  S     -  C.m  A  <   0 
n  iii 


Under  heavy  traffice  conditions,  we  derive  the  expression  for  the 
waiting  time  in  the  queue  in  the  case  of  a  single  processor  system  with 
interarrival  time  of  tasks  being  C.m. A  and  the  execution  time  for  a  job 


i  i 


being  S.   Since  S  and  C.m. A  are  exponentially  distributed,  then: 


C.m. 
E[m.C.A]  = 


"i~i"J     A 


E[S.]  =  — 
l    a. 

l 


2     2 
°1 


Consequently,  we  get  that  the  mean  waiting  time  in  queue  for  a  single 
processor  system  is  equivalent  to  the  average  amount  of  work  remaining  in 
the  subsystem  W   .   Moreover,  W    is  given  by: 

q/c.m.ms*)     1 

S  2       4-p/ 


C.m.    I 

i  i  a .  , 

(-L-) 

2  1_pi 


where 
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pi  =  (A/Cimi>^ 

1 

Thus  the  total  average  amount  of  work  remaining,  W  ,  is: 

N     -c\          N           i                      1 
Ws  =     E     W^i;   =     I     ^-_  ( i_ )  (3>12) 

i=0  i=0  C.m.a.      1  -  — - 

ill  a .C .m. 

ill 


i=0        l  l 


or 


N         1  1 

W_  =     A   E      (~-)(w  rl     i)  (3-13> 

S         i-0    ai    aiciVA 

We  can  minimize  the  value  of  W  with  respect  to  m.  under  the  con- 

N  _  1 

dition  m  =  E  m..   Since  W  (mn,m  , . . . ,m   )  is  a  convex  function,  we  can 

i=0 

use  the  Lagrange  method.   Let 

_  _  N 

W   (m   ,m   , ...,m^,B)   =  Wg  +  B(   E     m±  -  m)      . 

i=0 

3Wg 

To  solve  - —  =   0,   we  get 
dm. 

AC. 
l 


(a. Cm. -A)2 
ill 


Therefore, 


i  A5" 


mi  =  ^:a+/— }  •  (3-14) 

1 1 


N 
Since  m  =      E      m. , 
i=0      X 


m  =   A 


N     i       N    n~        N     i       K  N    nr 

E     -V  +      2        -\~  =    A      E     -V"  +   £     E      A±- 

i=0  Vi        i=0/6afc.  i-0  °iCi       y6   i=0/afc. 

li  ii 


Thus 
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/T  = 


N         1 

m  -    A    E  -^tt- 

.    r,  a.C. 
i=0      i   i 


A 


N 


i=0  /a.C. 


(3.15) 


Therefore,    from  eqs .    (3.14)    and    (3.15),   we  get: 


/m  -  A   E 


in.    = 


a  .C. 
l   l 


A   +/C. 


V 


.  a.C. 
-1=0      J    -1 


N 
I 


\      r 


2c. 


(3.16) 


Furthermore,  the  minimum  value  of  W  is: 


Min(Wg) 


N 
A      E 
i=0 


N 
m  -   A    E 


A  +  /c7 

l 


,  a  .C. 
1=0      J    J 


N 
E 

j-o 


-   A 


1 


a2C. 
J    J 


-1 


1 


N 


N 
=   A      E 


1=0  /a.C. 
_ JUL 


N 


* 


m  /CT    -    X  /CT      E       (— TT-) 
l  l    .  a.C/ 

J=0        J    J 


N 
A    (   E 


N 


-)•    E     U-±f)    (j~) 
j=0/aTC.      i=0       /cT  i 


-  A     E      ( 


a.C. 
J=0        j    j 


Therefore, 
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N  n 

i(W  )  =  (  E  /-—-)  3.17) 

i=0  a.C.   I  m 

11   I  -  -  £ 

.  n  a.C. 
3=0        3    J 


If  C.  =  C  for  all  i,  then  eq .  (3.16)  can  be  simplified  to: 

m.  =  — — —  (3.18) 

i    N    ,      a. 

3=0  »J 


Obviously,  when  C.  =  C,  the  value  of  m.  which  minimize  W  is  independent 
of  C.   In  this  case,  Min(W  )  is  given  by: 

N       2        N        -1 
Min(Ws)  -i.  (J   i)  (f-I  Z       i)  (3.19) 

J=0   J         j=0   j 


Figure  3.7  shows  the  curves  W  =  Min(W  ) ,  and  Wg  in  the  case  of 

N  +  1  =  3,  ( — )  =  i  +  1  for  i  =  0,1,2,  C.  =  C  and  m  =  10.   As  expected, 

i 
under  heavily  loaded  conditions,  the  optimized  system  with  special  purpose 

processors  behaves  better.   Even  in  the  case  C  =  1  (i.e.,  no  improvement 

of  processor  speed),   the  value  of  W  is  almost  comparable  to  that  of  Wg. 

Obviously,  we  obtain  this  result  since  the  architectural  flexibility  in 

scheduling  does  not  make  much  difference  under  heavily  loaded  conditions. 

Figure  3.8  describes  another  case  where  N  +  1  =  m  =  3, 

( — )  =  (0.9)1,  C.  =  C  for  i  =  0,1,2.   Therefore,  m.  =  1  for  all  i.   Since 
a.  i  i 

i 

all  subsystems  consist  of  a  single  processor,  eq.  (3.12)  holds  for  any 
value  of  p.  (not  necessarily  p.  -  1  for  all  i) .   Obviously  W  cannot 
be  optimized  in  this  case. 
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where 
for  Wg: 


P  = 


25,   1 
<~, )  »  mi 


5 ■  Wg  =  TT77 '  mi 


5,  m  =  10,  m  =  15 


for  Ws:   p.  =  £,  Ws  -  6(^-) 

P 


Figure  3.7  Wg  and  W  under  Heavily  Loaded  Condition 


27 


W   <i 


ws(c=i) 


30 


20 


10 


0.7 


0.8 


0.9   0.95 


Figure  3.8  Wg  and  W  under  Heavily  Loaded  Condition 
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4.   THE  SYSTEM  WITH  SERIAL  INPUT  PROCESS 

In  this  section,  we  consider  the  model  in  which  the  interarrival 
time  of  jobs  is  exponentially  distributed  with  parameter  X.      Each  of  the 
arrived  jobs  consists  of  N  +  1  different  types  of  tasks  which  must  be 
executed  in  sequence.   The  amount  of  work  for  a  type  i  task  is  exponen- 
tially distributed  with  parameter  a..   That  is,  if  S.  is  the  amount  of  work 

1  -at 
for  a   type  i  task,  Pr{S.  <_  t}   «=  1  -  e    ,  t  >_  0. 

4.1   The  Performance  of  a  Multiprocessor  System  with 
General  Purpose  Processors 

This  system  consists  of  m  general  purpose  processors.   It  can  be 

described  as  in  Figure  4.1.   Since  the  tasks  must  be  executed  in  order, 

the  execution  time  of  a  job  S  is: 

where  S.  is  the  amount  of  work  for  a  type  i  task. 
l 

We  again  assume  heavily  loaded  condition  as  in  section  3.2. 
Since  the  model  is  the  same  as  that  in  section  3.2  in  this  case,  from 
eq.  (3.11),  the  average  amount  of  work  in  the  system  is 

1    f   N    1  N    1  2\ 

N     1 

where  p  =  (X/m)   I   —  . 
i=0   x 
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4.2  The  Performance  of  a  Multiprocessor  System  with 
Special  Purpose  Processors 

This  system  is  described  as  in  Figure  4.2.   The  ith  subsystem 
consists  of  m.  special  purpose  processors  where  m  =  m_  +  m.  +  ...  +  m^. 
Since  a  special  purpose  processor  can  execute  only  one  type  of  tasks,  the 
job  must  go  through  the  series  of  N  +  1  subsystems. 

Let  us  consider  the  number  of  tasks  in  the  ith  subsystem  with  C. 
being  the  capacity  of  a  processor  in  the  ith  subsystem  (C.  >^  1). 
The  execution  time  of  a  job  at  the  processor  in  the  ith  subsystem  is 
exponentially  distributed  with  parameter  C.a..   Therefore,  by  Burke's 
theorem  [5],  the  interarrival  time  of  jobs  to  the  ith  subsystem  is  also 
exponentially  distributed  with  parameter  X.      If  we  assume  heavily  loaded 

condition  (all  processors  are  busy  all  the  time) ,  the  average  amount  of 

-(i)  ' 
work  in  the  ith  subsystem  W     is: 

77(1)'        _J ,_l_s        .  _J. 

W^        =  t:  (t~Z~)      where     p .    = 

S            „          2  vl-p/  i        C.m. a. 

C.m. a.             i  ill 
ill 

However,  the  jobs  in  the  ith  subsystem  have  additional  amount  of  work. 
(That  is,  tasks  for  i  +  1,  i  +  2,  ...  N  types.)   Therefore,  the  amount  of 
work  in  the  ith  subsystem  W    is 

W<1}  =  n.    ?    i  +W'(i) 
S      i  .  .,,   a.     S 

where  n.  is  the  mean  number  of  jobs  in  the  ith  subsystem.   Since 

Pi   [31 
n.  = ,  then  we  get: 

l   1-p . 

w« .  j^  ;  i  +  *  ^ . 

S     1-p .     .  ,  n   a     „    2   1-p . 
l   i=i+l    i     C.m. a.      l 
J       J     ill 
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Therefore,  the  total  average  amount  of  work  W  is: 

N    ,..           N    p.       N  p. 

W_  =   Z   W^  =  Z       (rr^-)     (   Z    —  )  +  —  (r-M 

S    .  n  S     .  -   1-p .    .  ...  o.  a.   1-p . 

i=0        i=0     l   j=i+l  3  li 


N  N  p. 

£  Z  —  (t-2-)   • 

.  A  .  ,  a .  1-P . 

i=0  j=l  j  i 


(4.1) 


where 


i   C.m.a. 
ill 


If  C.  =  C  for  all  i,  then 


/    X 


W„  = 


N        N 

1 

E         I 

i=0  j=l 

ot. 

m.a . 
l  i 


C  - 


m.a. 
i  l 


(4.2) 


Figure  4.3  describes  the  behaviors  Wg  and  W  in  the  case  of  N  +  1  =  m  =  3, 

(^-)  =  (0.9)1,  C.  =  C. 
i 


33 


w    < 

W_(C=1) 
S 

Wg 

35 

1            WS(C=1.2) 

30 

20 

10 

Wg(C=1.5 

^    Wg(C=2) 

V 

0.7        0, 


0.9   0.95 


Figure  4.3  Wg  and  W  under  Heavily  Loaded  Condition 
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APPENDIX 

In  this  appendix,  we  derive  the  expressions  given  by  eq.  (3.5), 
the  average  amount  of  work  in  equilibrium  state  for  the  system  consisting 

of  m  general  purpose  processors  (Figure  3.2).   Therefore,  we  need  to  solve 

c        i  •   dP(Z)    .  .  ,  . 

for  lim  — — —  ,  which  is  given: 

Z-l   dz 

i ,„  dP(Z)    . .   rA(Z)F(Z))'(A(Z)-Zm)-A(Z)F(Z)(A'(Z)-mZm"11 
lim  — — —  -  lim  1 x ) 

Z+l  aL  Z->1  (A(Z)-Zm) 

Let  N(Z)  and  D(Z)  be  the  numerator  and  denominator  respectively  in  the 

fraction  above.   Since  lim  N(Z)  =  F'(l)  -  F'(l)  =  0  and  lim  D(Z)  =  0,  we 

Z+l  Z-+1 

a    +.  c-    a    ^  *        u  +.X.    dN(Z)    ,  dD(Z)     „.    . 

need  to  find  the  expression  for  both  — — —  and  — — —  .   Clearly, 

dD(Z)  m   2(A'(Z)-mZra-1)(A(Z)-Zm)  . 


dZ 


Therefore, 


u.  MI  .  o 
z-i  dz 


Also, 

dNXZ)  ■  2AA'F'  +  A2F"  -  mZm"1(A,F+AF')  -  Zm(A,,F+2A,F'+AF") 
uZ 

+  FA(m(m-l)Zm"2)  +  mZID"1(AF,+A,F) 

=  F"(A2-ZmA)  +  F,(2AA,-2A»Zm)  +  F(m(m-1)  Z^A-zV  )  .        (A-l) 

2  2 

Since  lim  ^}p-   =  lim  {Ff  (2A'-2A' ) }  =  0,  we  need  to  find  d  N<Z)  and  d  ^Z) 
Z+l   dz    Z->1  dz       dz 
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Differentiating  the  expression  in  eq .  (A-l) ,  we  have 
^fP  -  F'"  (A2-ZmA) 

+  F,,(2AA,-ZmA,-mZm"1A+2AA'-2A,Zm) 


+  F'(2A,2+2AA"-2(ZmA"-hnZm  1A,)+m(m-l)Zm  2A-ZmAM) 


+  F(m(m-l)Zm  2A,+m(m-l)(m-2)Zm'3A-ZinA,,,-mZm"1An) 


Since  F(l)  =  0,  A(l)  =  1, 


2 
lim  d  ^Z)  =  F"(l)(4A'(l)-m-A'(l)-2A'(l)) 
Z-l   dz 

+  F,(l)-(2A,2(l)+2A"(l)-2A,,(l)-2mA,(l)+m(ra-l)-A"(l)) 

2 
lim  d  ^Z)  =  2  lim  (A(Z)-mZm_1)2  =  2  (A'(l)-m)2  =  2F»(1)   • 

z-i   dz      Z-l 


Hence, 


N"(l)  =  F'^D'F'Cl)  +  F'(1)(2A'(1)F,(1)  +  m(m-l)  -  A"(l))  , 
D"(l)  =  2F,2(1)   . 


Thus, 


1 .   dP(Z)  =  F"(l)    2A'(l)F'(l)+m(m-l)-A"(l)  . 

^   dZ   '  2F'(1)  +        2F'(1)  lA'/; 

In  other  words,  the  mean  number  of  tasks  in  the  system  is: 

-  _  F"(l)    m(m-l)    -A"(l) 

n  "  2F'(1)  +  2F'(1)  +  2F'(1)  +  A  (1)  (A'3) 

m-1 
where  F(Z)  =  E  n.(Z1-Zm).   Particularly,  if  m  =  1,  then 

i=0  X 
F'(l)  =  A'(l)  -  1  =  -(1-p) ,  where  p  is  the  mean  number  of  arrivals  during 
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one  time  slot,  and  F"(l)  =  0.   Hence, 

A"(l) 


n  =  p  + 


2(l-p)   * 


F"(l) 
We  now  solve  the  value  of   , /  .  .   According  to  eq.  (3.3),  P(Z) 

F  (,i; 

is  expressed  as  follows: 

A(Z)F(Z) 


P(Z)  = 


A(Z)-Zm 


Since  P(Z)  is  analytic  and  bound  within  the  unit  circle,  all 
zeros  of  the  denominator  within  the  unit  circle  must  coincide  with  those 
of  the  numerator. 

While  by  Roache's  Theorem,  F(Z)  can  be  expressed  as  follows: 

F(Z)  =  K(Z-1)(Z-Z_)...(Z-Z   ) 
U        m-z 

where  Z.    is   the  root   of  A(Z)    =  Z      inside  of  unit   circle.      Thus, 

m-2  m-2 

f'(z)  =  k  {  n    (z-z.)  +  (z-i)    z      n     (z-z.)}     , 

l-± 


i=0  X  i=0  jff 


m-2 
F'(l)    =  K     n      (1-Z.)      . 
i=0  1 

Therefore, 

m-2  m-2  m-2 

f"(z)  =  k  {  z      n  (z-z.)  +    z      n  (z-z.)  +  (z-i)(  z      n  (z-z.))*}, 

i=0  jH         J  i=0  jj^i         J  i=0  jffci         J 

m-2 
F"(l)   =  K  {2     Z        n    (1-Z.)}      . 
i=0  jH  3 
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Then,  we  get: 


m-2 
2K  E   n  (1-Z.) 
F"(l)  =    i=0  j±l         J 
F'(l)  '      m-2 

k     n      (1-Z.) 
1=0     x 


m-2 
=  2   E 


i-o   (1-zi> 


Therefore,  the  mean  number  of  tasks  Is: 

~  .  m:2    1        m(m-l)  A"(l) 

"  1=0  (1_Zi)    2(m-A,(D)  2(m-A'(l)) 

zffi 

where  Zft,Z.,...,Z   _  are  the  zeros  of  (1  -   /  .  ) ,  in  the  unit  circle, 
U   i      m-2  A(.ZJ 
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